A PagerDuty Service may represent an application, component, or team you wish to open incidents against. PagerDuty enables you to model your services as they actually exist within your infrastructure. You can drive more clarity and accountability by aligning services with component ownership so that the alerts your responders receive are not just from siloed tools.
We recommend keeping the following items in mind when setting up your services:
Reduce confusion with services and integrations by agreeing on naming conventions
It’s common for multiple teams across an organization to use the same tools. Thus, it wouldn’t be very helpful for each team to name one of their services “Datadog”. Get together with your team and come up with naming conventions that everyone can use to differentiate their service with another team’s service.
Some things you might want to include in your service names:
- Team names
- Business Unit
- Production environments
- Priority level
- Integration/Monitoring tool name
- Customer name
Including additional names and keywords to your services will also help narrow down search results for services by a particular name.
Add descriptions to your services
Make sure that everybody knows what types of incidents are supposed to trigger on a service by adding a description to your service. To add a description, go to the service > click edit, and under the Service Name there is a description box.
Review your timeout settings
There are two timeout settings on a service:
- Incident ack timeout: determines when users should be re-notified if an incident has been acknowledged for too long. The default is 30 minutes.
- Auto-resolve timeout: determines when a PagerDuty incident should automatically resolve itself. The default is 4 hours.
You can change the threshold of these settings or disable them completely based on your use case. For example, if incidents generally take longer than 4 hours to resolve, you may want to increase this threshold or disable it altogether.
If incidents take longer than 30 minutes to investigate and resolve, it’s not helpful to have a 30 minute incident ack timeout period. As a best practice, the ack timeout period should reflect the average length of time it takes to resolve an incident. On-call responders can then choose to snooze the incident if they need more time.
Cut the incident noise with email managements rules
If you’re using an email integration, take advantage of email management rules. Setting up these rules will allow you to control which incidents are triggered, making sure that only the important ones notify your on-call team, and will also allow you to more accurately report on incident resolution times.
Connect your services to your chat and collaboration tools
Webhooks send out HTTP callbacks when interesting events happen to incidents within your PagerDuty services.
Integrating your services with your chat and collaboration tools via webhooks is a great way to create transparency around your incidents.
Add multiple integrations to a service
You can add more than one integration to a PagerDuty service, and may want to do so if several integrations are monitoring the same piece of infrastructure. Learn how to add multiple integrations to a service here.
For additional tips on how to use multiple integrations to best represent your internal systems, please check out our best practices article here.